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q ', Abstract 

Motivated by the approach of random linear codes, a new distance in the vector space over a 
finite field is defined as the logarithm of the "surface area" of a Hamming ball with radius being 
\ the corresponding Hamming distance. It is named entropy distance because of its close relation 

with entropy function. It is shown that entropy distance is a metric for a non-binary field and 

{Sj . a pseudometric for the binary field. The entropy distance of a linear code is defined to be the 

smallest entropy distance between distinct codewords of the code. Analogues of the Gilbert bound, 
the Hamming bound, and the Singleton bound are derived for the largest size of a linear code given 
the length and entropy distance of the code. Furthermore, as an important property related to 
lossless joint source-channel coding, the entropy distance of a linear encoder is defined. Very tight 

i— i" upper and lower bounds are obtained for the largest entropy distance of a linear encoder with given 

■ dimensions of input and output vector spaces. 

>: 
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p : 

O : 1 Introduction 

m ; 

• • ' The aim of channel coding theory is to find effective ways of combating noise so that information can 
. ^ ■ be transmitted reliably and quickly. One of the most important topics in this field is about linear codes 
^ . with large minimum distance, because large minimum distance implies good error- correcting capability 
(see e.g., [ffl2]). 

Let F q be a finite field of order q = p r , where p is prime and r > 1. The vector space of all n-tuples 
over F q is denoted by F£. We usually write a vector in F™ in the row- vector form x = (xx,X2, ■ ■ ■ , x n ), 
and for c G F q we denote by c the all-c vector in F™. The (Hamming) distance dn(x, y) between 
x > y £ is denned to be the number of coordinates in which x and y differ. In particular, we define 
the (Hamming) weight wt(x) of x e F™ as dn(x, 0). An [n,k] linear code C over F g is a /c-dimensional 
subspace of F™, and a vector in C is called a codeword of C. The (minimum) distance of C is defined 
to be the minimum of distances between distinct codewords of C, or equivalently, the minimum weight 
of nonzero codewords of C. Then an [n, k] linear code with distance d is usually denoted as an [n, k, d] 
linear code. 

The significance of minimum distance is related with a classical channel model called binary sym- 
metric channel (BSC). Over a BSC, the optimum decoding rule is to decode to the codeword closest 
(in Hamming distance) to the received n-tuple, so a linear code with distance d can correct (d — l)/2 
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or fewer channel errors. Note that the amount of information that a linear code carries is characterized 
by its dimension k or the rate k/n, so one goal of coding theory is to determine the largest rate of a 
linear code with a given distance (or the largest distance of a linear code with a given rate). There are 
countless papers on this topic (including nonlinear codes), but so far, there is still a large gap between 
the best known asymptotic lower bound and asymptotic upper bound on the rate of codes (see e.g., 
[T|[3l4T0] and the references therein). 

This is a strange phenomenon, because on the channel coding problem, information theory has 
provided very tight asymptotic lower and upper bounds which in fact coincide at the point called channel 
capacity (see e.g., [EE]). This implies that the coding problem based on the distance of linear codes has 
diverged from its original motivation for reliable transmission in the sense of information theory. On the 
other hand, we note that the approach of random linear codes (usually using a uniformly distributed 
random matrix) is frequently used in theory to construct capacity-approaching coding schemes or linear 
codes with large distance (see e.g., [T2T[To] and the references therein). If the approach of random 
linear codes is in the correct direction, at least the author believes so, then do we need to rethink of 
the distance of a linear code? Is it a good criterion of error- correcting capability? Or can we learn 
something valuable from the random-linear-code approach? 

These questions motivate this paper, which will present a new distance of (linear) code called entropy 
distance. Roughly speaking, the entropy distance between x, y G F™ is defined as the logarithm of the 
"surface area" of a sphere with radius dn(x, y), and the entropy distance of a linear code is defined in a 
similar way to Hamming distance. A linear code with large entropy distance must have large (Hamming) 
distance, but not vice versa. Furthermore, we shall define the entropy distance of a linear encoder, an 
interesting property related to lossless joint source- channel coding. Several lower and upper bounds 
about entropy distance of linear codes and linear encoders are derived, and concrete examples with 
large entropy distance are also provided. In the case of linear encoders, the lower and upper bounds on 
entropy distance turn out to be very tight. 

The rest of this paper is organized as follows. In Section [2J we revisit the sphere packing problem 
in an information-theoretic manner (by the approach of random linear codes). In this process, we 
propose a sufficient condition (called "white" condition) for universal packing. To some extent, the 
minimum (Hamming) distance of a linear code is a simplification of the "white" condition. As another 
simplification, entropy distance is defined. In Section [3], we investigate the properties of entropy distance 
of linear codes. A lower bound and two upper bounds on the largest size of a linear code with a given 
entropy distance are derived. In Section |U we goes further to define and study the entropy distance of a 
linear encoder. An upper bound and a lower bound on the largest entropy distance of a linear encoder 
are derived in terms of the dimensions of input and output vector spaces. Concluding remarks are given 
in Section [5j 

In the sequel, the multiplicative subgroup of nonzero elements of ¥ q is denoted by F* . The group 
of all permutations of the set {1,2,..., n} is denoted by S n . Each o G S„ together with each v G F* n 
induces a monomial map m^v : F™ — >■ F™ given by x — > {vxx a -i^, . . . ,v n x a -i^). In particular, m CT) i is 
called coordinate permutation and is also denoted a for convenience. The set of all monomial maps of 
F™ is denoted by 9Jt(F£). 

For convenience of notation, we define aA := {ax : x G A} and v + J 4 = i + v:={v + x:xGi} 
for a G F q , v G F™, and A C F™. 

An m-by-n matrix over a field is written as M = (M it j) mxn where M^j denotes the (i,j)th entry. 
The transpose of M is denoted by M T . The n x n identity matrix is denoted l n . 

The identity function on a set A is denoted id^ : A — > A (given by x t— > x). For a subset B of A, 
the indicator function 1 B : A — )■ {0, 1} is given by x h-> 1 for x G B and x h->- for x G" B. When the 
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expression of B is long, we write IB in place of 1b{x). 
For x G [0, 1], we define the Hilbert entropy function by 

B. q (x) := x\og q (q- 1) -x\og q x- (1 - x) log ? (l - x) 

with the convention \og q = 0. By H" 1 we mean the inverse of H q from [0, 1] to [0, 1 — q^ 1 ]. 

The floor function [x\ and ceiling function \x] of a real number x are defined to be the largest 
integer not greater than x and the smallest integer not less than x, respectively. 

Following the usual convention, we always mean Hamming distance when we say distance, and 
entropy distance should always be stated explicitly. 

2 Motivation and Definition 

The essential of channel coding is related with a concept called sphere packing. To some extent, it 
corresponds to the partition induced by an optimal channel decoder. Let g : ¥ q — > ¥ q be a decoder, and 
then the partition {g~ 1 (x) : x G F^} of ¥ q may be regarded as some kind of "sphere" packing. However, 
the balls here are generally irregular and heterogeneous, or should not be called ball at all. 

In coding theory, we usually consider the packing problem of balls in Hamming distance. In the space 
¥ q , a sphere with center xGFJ and integer radius r is the set of all vectors which are all the same distance 
r from x. The sphere together with its interior is called a ball, i.e., the set {x' G ¥ q : dj^x^x) < r}. 
The problem of finding the largest rate of codes with distance d is equivalent to the problem of finding 
the maximum number of balls of radius d/2 that can be packed into the space ¥ q . Obviously, this kind 
of balls is so regular that a large proportion of the space is wasted in a general case. It is by no means 
the kind of sphere packing that information theory expects. 

If we think in a manner more analogous to information theory, for example, we may allow the ball 
contain some holes as long as the total volume of holes is negligible in a certain sense, then the situation 
changes drastically. By the approach of random linear codes, we shall show that in this new sense, 
there are linear codes whose sphere-packing radius is almost as high as their distance, and that this 
kind of linear codes is characterized by a weight distribution that has almost the same shape as the 
function — 1)*, the "surface area" of a sphere with radius i. For an [n, k] linear code C over ¥ q , its 
weight distribution is a vector (A (C), . . . , Aj(C), . . . , A n (C)) where Aj(C) is the number of codewords 
of weight i in C. The next two propositions conclude the existence of such a linear code. 

Proposition 2.1 (cf. P3MII]). For n > k > 1, there is an [n, k] linear code C such that 

MC) < nq-( n -V Q (q -if Vz = 1, 2, . . . , n. (1) 

Proposition 2.2. Let C be an [n,k] linear code satisfying ([I]) and S a subset of ¥ q containing 0. // 
\S\ < q n - k /(2n), then there exist f G £Dl(FJ) and B = B(f) C S such that 

(1) 0GB, 

(2) \B\ > \S\(1 -2nq^ n -^\S\), 

(3) The family {-B c }cec* of sets B c := c + f(B) is pairwise disjoint. 

In particular, if S is invariant under any monomial map and \S\ < q n ~ k /n, then there exists B C S 
such that 
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(1) ' S := {s G S : ( wt ( s) ) (<7 - l) wt(s) < ^VH^I)} C B, 

(2) ' |^| > Ifirifl - »« ? -C»-*)(I J sr| - 15b!)], 

(3) ' The family {B c } ce c of sets B c := c + B is pairwise disjoint. 

Proposition 12.21 is more general than what we need, so let us give some explanation. 
It follows from Proposition 12.11 that 

d H (C) > min jz : (q - 1)' > -V" fc 
Because (")(? — 1)* < q nK iW n ) ( see Lemma [A. 11) . we have 

^>* :=H -(i-*-!^Wi-<r 1 ), (2) 

n H \ n n J 

which is the well-known fact that random linear codes achieve the asymptotic Gilbert-Varshamov (GV) 

bound [hehhlh] . 

Let S be a ball in F™ with center and radius r = \_8n — e(n)J . The size of S is 



i=o 



q (r/n) < ^H,(j- £ (n)/ii) 

by Lemma [A. 11 Because H g (x) is concave, 

Hg(x) < H 9 (x ) + H^(a;o)(a; - x ) 

= H 9 (x ) + (ar - x ) \og q — — — — — , 

so that 

^nU q {S-e(n)/n) < ^H,(J)+e(n) log q 7 _ ^-l^n-fc e(n) 

where 

This bound combined with Proposition 12.21 yields the next corollary. 

Corollary 2.3. Let C fre an [n, k] linear code satisfying (CQ) and S a ball in F" wni/j center and radius 
r = \_5n — e(n)J, where 5 is defined by (|2J) and e(n) > 0. JTien i/iere exists B C. S such that 6 B, 
|5| > |5|(1 - 7 e(n) ) ; and tfie /am% {c + B} ceC of sets is pairwise disjoint, where 7 is defined by (j3J). 
(7n particular, if we take e(n) = log„n ; we obtain a "rough sphere" packing with radius about 5n, almost 
as large as the distance of C q) 

1 "Rough sphere" packing differs from list decoding in that there is no uniform restriction on the number of codewords 
within distance r from every vector in F™ although most vectors have at most one codeword at distance r or less from 
them. 
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Note that the maximum possible size of a "ball" for packing is q n ~ k , so the above "rough sphere" 
packing is asymptotically the best that we can do. This implies that when we are seeking linear codes 
with large distance, we should also be careful to check their packing radius of "rough sphere" , especially 
those codes exceeding the GV bound. Recall that the packing radius of a so-called perfect code is only 
about one half of its distance, and it cannot be improved by "rough sphere" packing. 

It is natural to ask why a linear code satisfying ([1]) has a large packing radius of "rough sphere". 
Clearly, it is due to the shape of the weight distribution. The weight distribution of a linear code 
is more important than its distance. However, we have little knowledge about other kinds of weight 
distributions that also enable a linear code to have good capability of "rough sphere" packing H But 
note that Proposition 12.21 indicates that linear codes satisfying flTJ have magic capability of packing in a 
more general sense, that is, it allows the shape of filler 5* to be arbitrary. This may be called universal 
packing, and in fact it is well known that random linear codes are universal for channel coding, an 
intrinsic property that can be found in almost every information-theoretic proof based on random linear 
codes (see e.g., [15]). Using a similar terminology in signal processing, we call an [n,k] linear code a 
"white" code if its weight distribution is close to the shape of — 1)*, roughly in the form 

~ o" (n_fc) Vi = 1 2 n (A) 
_ iy q ' ' '"' ' 1 ' 

a main part of (CQ) . A "white" code is difficult to "attack" , even by a deliberately designed noise, because 
the codewords of such a code is uniformly spread in the spectrum (i.e., weight distribution) and hence 
can easily avoid the attack of noise by randomly choosing a monomial map. 

Now that a "white" code is so good, why not using this criterion in code design? However, de- 
termining the weight distribution of a linear code is a very difficult task, which makes the criterion 
impractical. The success of minimum distance of linear codes is partly because it is easier to compute. 
In fact, computing minimum distance of a linear code is equivalent to determining the leftmost weight 
segment in which the weight distribution is zero, and we note that if the distance does not exceed the 
GV bound, the weight distribution in that segment happens to be "white" by (J4]). Then it is natural 
to ask if there is other "zero" weight segment that also coincides with the "white" condition fll]). By 
checking (J4]), it is easy to find that there is possibly a rightmost weight segment in which the weight 
distribution is zero. So as a compromise between minimum distance and (J3|), we may design a criterion 
that tracks the leftmost and rightmost weight segments of zero weight distribution. But in order to 
track these two segments, we would need two parameters, say (di,d 2 ), for example. Can we find only 
one parameter to track both of these two segments? Yes, we can. It is entropy distance. 

Definition 2.4. The entropy distance dE(x, y) between x, y 6 F™ is defined by 

d E (x,y) := h,, n (wt(x- y)), 



where 

h q>n (i) : = log, 



n 



for i = 0, 1 



(q-iy 

The entropy weight h(x) o/x 6 F™ is defined as dE(x, 0)Jf| The entropy distance dE(C) of a linear code 
C is defined to be the smallest entropy distance between distinct codewords of C , or equivalently, the 
minimum entropy weight of nonzero codewords of C . 

2 One of the candidates might be polar codes [TB] although their minimum distances are asymptotically bad. 

3 A different definition of entropy weight is given in [T7] based on a variant of complete weight. They are similar 
but motivated by different random coding techniques, i.e., random monomial map and random coordinate permutation. 
Obviously, the definition in this paper is better because monomial maps include coordinate permutations as a proper 
subset. 
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At first glance, the definition of entropy distance may seem very artificial, but the next propositions 
will convince the reader that this definition is so natural that it qualifies as a metric or pseudometric. 



The name "entropy distance" comes from the property (3) in Proposition 12.51 



Proposition 2.5. Let q > 2, n > 1, and < i < n. 

(1) < \ n (i) < n. 

(2) Let Xq := [(q — l)n — l]/q and then 

\,n(i) § \,n(i + 1) for i = Xq. 
The function h 9in (i) has one or two maxima at i = \xq] , [a^oj + 1- 



(3) For a E [0,1], 



lim -h^QcmJ) = E q (a). 

n—tco /J, 



Proposition 2.6. Let a E ¥ q and x, y, z E F" 

(1) < h(x) < n. 

(2) For q = 2, h(x) = if and only if x = or 1. 
For g > 3, h(x) = if and only i/x = 0. 

(3) h(ax) = |a|h(x) wf/i |a| := wt(a). 

(4) g Mx+y) < /3(wt(x),wt(y))g h ( x ) +h W ; where 



max{w 1 ,n-w 1 ,w 2 ,n-w 2 } 
p(w 1 ,w 2 ) ■= E [0.5,1]. 



71 



(5) < dE(x, y) < n. (Non-negativity) 

(6) For q = 2, dE(x, y) = i/ and on/y z/x = y or x = y + 1. 

For g > 3, dE(x, y) = if and only i/x = y. (Identity of indiscernibles) 

(7) d E (x,y) = d E (y,x). (Symmetry) 

(8) dE(x, z) < ds(x, y) + dE(y, z). (Triangle inequality) 

In the next few sections, we shall investigate the issue about linear codes and linear encoders with 
large entropy distance as an independent mathematical problem, but the reader should keep in mind 
that entropy distance is only a simplification of condition fll]). 

The proofs of results in this section are presented in Appendix [B] 
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3 Entropy Distance of Linear Codes 

In this section, we shall investigate the properties of entropy distance of linear codes, especially con- 
cerning those codes with large entropy distance. Let us begin with some examples about the entropy 
distance of some familiar linear codes. 

Recall that an [n, k] linear code C is characterized by a k x n generator matrix G (such that 
C = {xG : x e F^}) or an (n — k) x n parity-check matrix H (such that C = {x G F™ : Hx T = 0}). 
A linear code is called the dual of C if its parity-check (resp., generator) matrix is a generator (resp., 
parity-check) matrix of C. We denote the dual of C by C . The famous MacWilliams identities tell us 
that 

W c ±(x,y) = — W c (y -x,y + (q- l)x), 

where Wc(x,y) := Y^i=o &-i{C)x l y n ~ l is the (homogeneous) weight enumerator of C (see e.g., pQ). 

Example 3.1. Let C be an [n, l,n] repetition code whose generator matrix is an 1 x n all-one matrix. 
Then its weight enumerator is 

W c (x,y) = (q-l)x n + y n (5) 

and hence d^C) = nlogJq — 1). 

Example 3.2. Let C be an [n,n — 1,2] single parity-check code whose parity-check matrix is an 1 x n 
all-one matrix, where n > 2. By MacWilliams identities with ([5]) ; we have 

W c (x, y) = -{(<? -l){y- x) n + [y + (q- l)x} n }, 

q 

hence 

A 1 (C) = 0, A 2 (C) = (<?-!) (™ 

A n 1 (C) = nKq-ir^ + jq-lK-ir- 1 ] ^ = (g - 1)" + (g - !)(-!) 

1 q ' q 

and therefore 

' log 2 n if q = 2 and n is odd, 



d E (C) 



if q = 2 and n is even, 

nlog 3 2 if q = 3 and n = 3,4,5, 
k h g>n (2) otherwise. 



Example 3.3. Le^ C be a [(q k — l)/(q — l),k,q k ~ 1 } simplex code whose generator matrix consists of 
(q k — l)/(g — 1) pairwise linearly independent column vectors, each chosen from a 1- dimensional sub space 
of¥ k . By fJl Theorem 2.7.5], its weight enumerator is 

Wc (x, y) = (q k - l^-y**- 1 - 1 )/^ 1 ) + y (« fc -i)/fe-i) ; ( 6 ) 

hence dn(C) = h q i ( g fc-i)/(g-i)(o fc_1 ), which is the maximum °f\,(q*>-i)/(q-\) by Proposition \2. 51 

Example 3.4. Let C be a [(q k — l)/(g — 1), (g fc — l)/(g — 1) — fc, 3] Hamming code, the dual of a 
[(q k — l)/(q — 1), k] simplex code, where k > 2. (7smj MacWilliams identities with (j6]), we aei 

W c (i,y) = i {(g fc - l)(y - xf'^y + (q- l^^-mi-D + [ y + ( q - i) x ](«--i)/(«-i)} , 
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hence 

A 1 (C) = A 2 (C)=0, A 3 (C)>0, 
_ (g - l)V- 1 -M<-»[( q - If' 1 + ^lj^V - 1)] 

and therefore 

fO ifq = 2, 

<1e(C) = < 51og 4 3 if q = 4 and k = 2, 

{ h g ,(g fc -i)/(a-i)( 3 ) otherwise. 

Example 3.5. Lei C 5e ine [2 m , ^ =0 (™),2 m " r ] H/t order binary Reed-Muller (RM) code (see e.g., 
IUGKA where < r < m. Then c1e(C) = because the Oth order binary RM code which is contained 
in every rth order RM code contains the all-one vector. Certainly, it is easy to construct codes of large 
entropy distance from RM codes with r > 1. We may choose an arbitrary coordinate i and let C be 
the subcode of C in which every codeword has symbol zero in coordinate i. Clearly, C is of dimension 
Y7i=i (7) an d ^ s en ^ ro Py distance is log 2 ( 2 m_ r ) ■ Puncturing C on coordinate i further gives a code C" 
of length 2 m — 1 and entropy distance 

r /2 m -l\ , / 2 m -l \1 , /2 m -l 
min < loffo , logo > = logo 

I oz \ 2 m ~ r I \ 2 m 2 m ~ r /I V 2 m ~ r 1 

Indeed, the binary simplex code can be constructed in this way with r = 1 . 

Next, we present several bounds on the size of a linear code with a given entropy distance. For 
< h < h. qtn (\[(q — l)n — l]/q\), we denote by D q (n,h) the largest number of codewords in a linear 
code over F g of length n and entropy distance not less than h. The next few propositions provide rather 
simple properties of D q (n, h). 

Proposition 3.6. 

n( l (i^ i 2 ^ lfq = 2 ' 
I q otherwise. 

The proof is left to the reader. 

Proposition 3.7. Let [di,d 2 ] = h~ ([h,n)). If d\ > 2, then 

D q (n, h) < D q (n - 1, min{h gin _i(di - 1), h gi „_i(min{d 2 , n - 1})}). 

Proof. Let C be a linear code of length n and entropy distance at least h with M codewords. Then the 
range of distances between distinct codewords in C are from d\ to d 2 . Since d\ > 2, puncturing on any 
coordinate yields a code C also with M codewords, and the distances between distinct codewords in C 
are between d\ — 1 and min{d 2 ,n — 1}, so that the entropy distance of C is bounded below by either 
hg,n-i(di — 1) or h gn _ 1 (min{(i 2 , n — 1}). Therefore 

M < D q (n - 1, min{hg )ri _i(di - 1), h 9in _i(min{d 2 , n - 1})}), 

and the proof is complete by letting M = D q (n, h). □ 

Proposition 3.8. Let [di,d 2 ] = h~ ([/i, n)). Then 

D q (n, h) < qD q {n - 1, min{h gin _i(di), hg in _i(min{d 2 , n - 1})}). 
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Proof. Let C be a linear code of length n and entropy distance at least ft with M codewords. Let C(x) 
be the subcode of C in which every codeword ends with symbol x. Then C(0) contains at least M/q 
codewords. Puncturing this code on coordinate n gives a code C of length n — 1, and the distances 
between distinct codewords in C" are between d\ and min{<i2, n — 1}, so that the entropy distance of C 
is bounded below by either h gin _i(<ii) or h 3! „._i(min{d2, n — 1})- Therefore 

g _1 M < D q (n - 1, min{h gjn _i(di), h 9jn _i(min{d 2 , n - 1})}), 

and the proof is complete by letting M = D q (n, ft). □ 

Now we shall derive several simple bounds on D q (n,h). By convention, a lower bound L(n,h) of 
D q (n,h) is said to be achieved by some linear code C if |C| > L(n,h) and dE(C) > ft. If there is a 
family {d}^ of linear codes Cj C (supposing is strictly increasing in i) such that 

log JCil -log L(ni,n^) 
limini — 2 2 > (7a) 



and 



liminf > h (7b) 



for some h G (0, 1), then we say the lower bound is asymptotically achieved by {Cj}~ 1 . 
The first is a lower bound, an analogue of the Gilbert bound [HE]. 



Theorem 3.9. 



Y.i;h q , n {i)<h (")(<? I)' 



Proof. Let B := {x G FJ : h(x) < ft,}. It is clear that ai? C B for a G F ? . Let C be a maximal linear 
code in the sense that d&(C) > ft and any larger linear code containing C has entropy distance less than 
ft. Then By Lemma [A. 21 C is a maximal -B-separable subspace of F^, and it satisfies Ucec( c ~'~-^) = 
so that |C| > g n /|i?|, which establishes the theorem. □ 

Just like the Gilbert bound, it is difficult to construct long codes achieving (JS}. But at least, we 
know that regular low-density parity-check codes (with the row weight of parity-check matrix being the 
logarithm of code lengtbQ) can achieve OH]) asymptotically, an easy consequence of the analysis of weight 
distribution of LDPC codes (see e.g., Theorem 5.6 and Remark 5.7]). 

The second is an upper bound, a simple modification of the Hamming bound (see e.g., [1] Theo- 
rem 1.12.1]). 



Theorem 3.10. 



DJn,h)< —, — , (9) 

where t = \d/2] — 1 and d is the smallest integer such that h ni? (d) > ft. 

Proof. Since the case of q > 3 is the same as the original Hamming bound, we only prove the case of 
q = 2. Let C be a (linear) code of length n and entropy distance ft. Then by definition, the weight of 
all nonzero codewords is between d and n — d. Let 

Bi := {x G F" : wt(x) < t} 



4 In the binary case, the row weight must be odd. 
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and 

B 2 := {x G F™ : wt(x) >n-t}. 
Then for any x G Bi, y 6 5 2) and c G C, we have 

wt(x — c) > wt(c) — wt(x) > d — t > t, 

wt(x — c) < wt(x) + wt(c) < t + n — d < n — t, 

wt(y — c) > wt(y) — wt(c) > n — t — [n — d) > t, 

wt(y — c) < wt(y — 1) + wt(l — c) < t + n — d < n — t. 

This implies that the family {c+B} ce c of sets with B := BiUB 2 is pairwise disjoint, so that |-B||C| < q n , 
which establishes the theorem. □ 



The third is also an upper bound, an analogue of the Singleton bound [19 
Theorem 3.11. Let [d x ,d 2 ] = h^ q ([h,n)). Then 

D q (n,h) < q^i^+iM. 



(10) 



Proof. By the Singleton bound, it suffices to show that D q (n,h) < q d ' 2 , which is obviously true by 
considering the standard form ( A ) of generator matrix of an [n, k] linear code. □ 

For illustration, we compute in Table [T] the lower and upper bounds of the largest entropy distance 
of a [7, k] binary linear code for 1 < k < 6 as well as examples achieving the lower bound. 

Table 1: The lower and upper bounds of the largest entropy distance of a [7, k] binary linear code 

k The lower bound The upper bound Examples (generator matrix) Entropy distance 

(by (0)) (by ©, (TTO1), and (HQ)) 



log 



2 K 3 J 

2 ( 2 ; 



log. '' 



log 



2 12J 



l0 S2 (1) 

i°g 2 (D 

l°g2 (D 



!og 2 (D 
!og 2 (D 

iog 2 (D 

!og 2 (D 

iog 2 (D 

log 2 (D 



(1110 0) 
'11 1000 0> 
v l 1 1 Oj 
10 10 10 1 
110 11 

1 1 1 1 
/l 1 1 1\ 

110 11 

1 1 1 

\o 1 i o/ 

(I 5 T T ) 

(i 6 o T ) 



!og 2 (D 
!og 2 (D 



log 2 (g) (cf. Example 



iog 2 (D 
i°g 2 (D 

log 2 (D 



We close this section with a result on D 2 (n, h 2 ,„(2)). 
Theorem 3.12. For n > A, 



D 2 (n,h 2 , n (2)) 



2 n 3 z/n zs odd, 
2 n ~ 2 if n is even. 
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Proof. The even case can be easily proved by Theorem 13.111 and the generator matrix (I n _2 1 T T ). 

As for the odd case, by Theorem 13.111 it suffices to show that -D 2 (n, h 2jn (2)) ^ 2 n ~ 2 and provide one 
example of [n,n — 3] linear code of entropy distance at least h 2!n (2). 

We first show that D2(n, h 2jn (2)) ^ 2 n ~ 2 . If it were false, then there would exist an [n,n — 2] linear 
code C of entropy distance at least h 2iTl (2). Let 

G:=(I n _ 2 v7 vj) 

be the standard form of the generator matrix with v 1; v 2 G F^ -2 . Since the weight of the codeword 1G 
must not be greater than n — 2, we have 1G = (1, . . . , 1, 0, 0), so that vi and v 2 must contain even 
number of ones. Next, let {efc}]JZ 2 be the standard basis of F 2 ~ 2 . Then the weight of the codeword 
e^G = (ek,Vi t k,V2,k) must not be less than two, so either vi± or v^k or both are one. Because n — 2 
is odd, there would exist k such that Vi t k = v 2 ,k = 1- However, the weight of (1 — e^JG would be 
(n — 3) + 2 = n — 1, which is absurd. 

For an example of [n,n — 3] linear code of entropy distance at least h 2 n (2), consider the generator 
matrix (I n _3 1 T T T ), which clearly has entropy distance h 2 n (2). □ 

4 Entropy Distance of Linear Encoders 

In this section we shall define entropy distance in a more general space, the direct product of two vector 
spaces. In particular, we shall define and study the entropy distance of a linear encoder. 
Let x F™ denote the direct product of F^ and F™. A vector x G Fj x F™ is written as 

(xi, x 2 ) := (x tj i, . . . , x ljk , x 2 ,i, • • • , x 2 , n )- 

For an all-c vector in F^ x F^ with c G ¥ q , we still write c for short. A linear encoder / : F^ — > F^ 
is a linear transformation from F^ to F™. The rate of / is defined to be k/n. Usually / is identified 
with its associated k x n transformation matrix, which is called generator matrix in coding theory. 
A linear encoder is said to be of full rank if its generator matrix is of full rank. A full-rank linear 
encoder is necessary for efficient information processing because the full-rank condition ensures that no 
information is lost during encoding (injective for k < n) or no vectors in the output vector space are 
wasted (surjective for k > n). 

Definition 4.1. The entropy distance dE(x, y) between xjGFjxF" is defined by 

d E (x, y) := d E (xi, yi) + d E (x 2 , y 2 ). 

Likewise, the entropy weight h(x) o/x6Fjx F™ is defined by 

h(x) :=h( Xl ) + h(x 2 ) = d E (x,0). 

The entropy distance d E (V) of a subspace VofW^x F™ is defined to be the smallest entropy distance 
between distinct vectors in V, or equivalently, the minimum entropy weight of nonzero vectors in V . 
Then the entropy distance d E (/) of a linear encoder f : F^ — > F™ is defined to be the entropy distance 
of its graph {(xi, /(xi)) : x x G F^}. 

By Proposition I2.6[ it is easy to verify that the entropy distance in F^ x FJ is a metric for q > 3 
and a pseudometric for q = 2. The idea of Definition 14.11 comes from the author's work on lossless 
joint source channel coding [T51[T7j. In a (distributed) lossless joint source-channel coding scheme based 
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on linear encoders, the sources are typically nonuniform (and correlated), and hence the output of a 
linear encoder for small- weight or small-entropy- weight input vectors is very important. In other words, 
even for the same linear code, different generator matrices may have very different performance. It is 
found that a linear encoder that is (universally) good (in the scheme proposed by [15]) maps vectors of 
small entropy weight to vectors of large entropy weight, an important property now characterized by 
the entropy distance of a linear encoder. 

Next, we study the lower and upper bounds on the largest entropy distance of a full-rank linear 
encoder. We denote by E q (k,n) the largest entropy distance of a full-rank linear encoder / : — > F™. 

Different from the entropy distance of a linear code, the entropy distance of a linear encoder / : 
F^ -> F" has a very simple and tight upper bound: 



Theorem 4.2. 



E q {k,n) < 



>-q,k 



:P¥1) 



(q-l)ra-l 

q 



ifq = 2, 
otherwise. 



(12) 



The proof is left to the reader. 

The (asymptotic) tightness of the upper bound is ensured by the following lower bound. 
Theorem 4.3. 

E q (k, n) > h := max J h : ^ ( k ) (q - l)^ < (q - l)(q n - /" ] 

I \,k(i)+hq,n(j)<h 



(13) 



where k' := min{k,n}. 

Proof. Let n' = max{fc, n}, I — n' — n, and 



B := {(x, x') G (Fj x ¥ n q ) x F^ : h(x) </i V Xl =OV (x 2 , x') = 0}. 



It is clear that aB C B for a G ¥ q and that 



\B\ <(q- l)(q n - q k '- l )q l + q n+l + q k - 1 



q n'+l + q k-l _ L 



Then by Lemma IA.21 a maximal 5-separable subspace V of F^ x F™ x F^ satisfies 

|J (v + B) = ¥ k q xFJx F^, 



so that 



vev 



k+n+l k+n' 

W\ > >^ = q k ~\ 



\B\ q 



n'+2 



(14) 



that is, the dimension of V is at least k — 1. 



For (x 1 ,x 2 ,x / ) 6 P x F" x F', we define the canonical projections 



7Ti(Xi,X 2 ,X 
7r 2 (xi,X 2 ,X / 
7r 2 3(Xi,X2,x' 



Xl, 

(x 2 ,x'). 
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Since the kernel of tt\ (resp., ^23) is a subset of B, which intersects V only at the zero vector, the kernel 
of Tii\v (the restriction of iz\ to V) (resp., 7r 23 |y) contains only the zero vector, hence 7i"i|y (resp., 7r 2 3|y) 
is injective, and therefore the dimension of V is at most k. 

Let S = (vr 1 |y)(V r ) x (7r 23 |y)(V r ). It is clear that \S\ > q 2k ~ 2 and that each (x 1 ,x 2 ,x / ) G S \ V is 
covered by (7r 1 |y)~ 1 (x 1 ) + B and (7r 23 |y) _1 (x 2 , x') + B. Then the bound (TT4"j) can be improved by 

q k+n + l + ^ q kW + g 2k-2 _ fc _ i 

11 — \B\ + 1 q n '+ l + q k - 1 q ' 

hence the dimension of V is exactly k, and therefore tti is an isomorphism. If k > n, then 7r 23 is also 
an isomorphism. Let / be the composition ^(ttiIv) -1 from F^ to F™. We conclude that / is a full-rank 
linear encoder of entropy distance not less than h Q . The proof is complete. □ 

It is easy to see that the upper bound ( Tl2l) is bounded above by n + \og q k + 1 and that the lower 
bound ( {TBI is bounded below by 

/( Q _l)2 n-l\ 

l0g * ( fc(n + l) J > ^ ~ l0g " * " l0g " (n + 1} " L 

Then the gap between the two bounds is of order \og q (k 2 n), which is asymptotically negligible relative 
to n if the rate k/n is bounded. Another fact to be noted is that the kernel and image of a linear encoder 
achieving the lower bound ({TBI also achieve the lower bound (jHJ) asymptotically. 



Example 4.4. Lei q = 2, k = 3, and n = 7. By (|T2|) we have 

£2(3,7) < h 2>7 (3) = log 2 35. 



Since 



3\ /7\ /3\ /7\ /3\ /7\ /3\ /7\ /3\ /7 



'== ' 3 MtJ HiJUJ H2JI7J HsJdJ HsJle 1 - 21 < 124 - 27 - 22 



and 

,3yv 2 7 V3/V57 v 1 y v 1 y v 1 y V6 y v 2y v 1 y V2A6 

it follows from (TIB"]) £/iat 

£ 2 (3,7)>log 2 Q Q =1 °g 2 21. 

For an example achieving this lower bound, consider the generator matrix of a [7, 3] simplex code ( cf. 
Example \3. 3\) . Its entropy distance is log 2 ((g) Q) = log 2 35. 

Constructing linear encoders achieving the lower bound ({TBI is a difficult problem. From the results in 
|17j . it follows that an arbitrary linear encoder of a linear code with large entropy distance concatenated 
with a low-density generator-matrix encoder (with the column weight being the logarithm of dimension 
of output vector space) can achieve ({TBI asymptotically (in a similar sense to (j7J)). 
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5 Conclusion 

In this paper, we proposed a new distance called entropy distance for a linear code or a linear encoder. 
The basic properties of entropy distance were investigated. Several bounds on the entropy distance were 
derived. In particular, we obtained the tight lower and upper bounds on the largest entropy distance of 
a full-rank linear encoder (Theorems 14.21 and I4.3[) . Some concrete examples of linear codes and encoders 
with large entropy distance were also provided. 

As a mathematical problem, entropy distance brings many interesting issues, some of which are not 
easier than their counterparts in Hamming distance, e.g., determining the tight lower and upper bounds 
on the largest size of a linear code given the length and entropy distance of the code (cf. Theorems 13.91 
and !3.10|) . On the other hand, the significance of entropy distance for coding applications, which remains 
for future study, is still far from being understood. 



A Lemmas 



Lemma A.l (cf. [11, p. 284]). Let q > 2, n > 1, and < k < n. Then J2 ieI Q(? - 1)* < <f H * (fc/n) 
with equality if and only if k = (q — V)njq, where 



I :-- 



0<i<n: 



k 



(q — l)(n — k) 



k—i 



{0,1,..., k} if k < (q - l)n/q, 

< I \ = -I {0,1,..., n} ifk = {q-l)n/q, 
{k, k + 1, . . . , n} if k > (q — l)n/q. 



Proof. Using the binomial formula X^=o {T) xl (^ 



x) 



1 with x = k/n, we get 



(q — l)n 




n—k 



and therefore (") (Q ~ — q nHq ^ k ^ n ^ with equality if and only if I — {0, 1, ... , n}. 



□ 



Lemma A. 2. Let B be a subset ofF™ such that aB C B for a G ¥ q . A subset S of F™ is said to be 
B-separable if S fl (s + B) = s for each s G S . A B-separable subspace V o/F™ is said to be maximal if 
any larger subspace containing V is not B-separable. Then a maximal B-separable subspace V satisfies 

Uv 



B) 



Proof. First note that for a vector space V, the -B-separable condition is reduced to V fl B = {0}. We 
suppose V 7^ F™ and choose any x ^ V. Since V is maximal, the subspace V := {ax+v : a G ¥ q , v G V} 
is not -B-separable, so that V'dB contains a nonzero vector x' = ax+v' for some a G F g \{0} and v' G V, 
and hence x = a _1 x' — a _1 v'. The proof is complete by noting that — a _1 v' G V and a _1 x' G B. □ 
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B The Proofs of Results in Section [2 

Proof of Proposition \2.1\ For any v G F* n , we define the linear transformation f v : F q n — > F q n given 
by x i — y vx, which is also a linear transformation of F™ onto F™. Let g be an arbitrary injective linear 
transformation from F^ to F™. An [n,k] linear code C v is defined to be the image f v (g(F q )). Let us 
compute the average weight distribution of C v over all v G F qn for nonzero weight. 

^EW = ^E E E HfM*)) = y] 



ve¥ x n q 1 «eF x n yeFj:wt(y)=ixeFfe\{0} 



= E E ^jE 1 ^(x) = y} 

xeFg\{0}y€F£:wt(y)=i <? n I 

= (g B -l)-V-l) (")(<?-!)*■ (15) 

It is then easy to show that there is a linear code C v such that ([I]) holds. If it were false, then for every 
v G F^n there would exist i ^ such that 

A,(a)>ng-("- fc )Q(g-ir, 

so that there exists at least one j such that more than (q n — l)/n linear codes of {C v : v G F*„} satisfy 

-(n-fc) 1 71 



A 3 (C v )>nq-^^j(q-iy, 
and therefore the average weight distribution of {C v : v G F*„} for weight j should be no less than 

which is absurd by (TT5]) . The proof is complete. □ 

Proof of Proposition \2.2\ For / G 97t(F"), we define a family of sets Sf jC := c + f(S) for c G C. Because 
these sets are homogeneous, it suffices to focus on one set, for example, Sf t o. We define the function 
$ f :S^ {0, 1} by 

$/(s) := ljthere exists c G C \ {0} such that /(s) = c + f(s') for some s' G S 1 }. 

Then the number of elements in 5y j0 that are overlapped with another 5^ jC for some c ^ is Xlses ^/( s )- 
Note that $/(s) can be bounded above by 

^/( s ) : = E E 1 ^( s ) = c +/( s ')}- 

cec\{o} s'es 

Then the average $(s) of $/(s) over all / G 9Ji(F") is bounded by 

\mw^)\ E ^/(s) = E^' 
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where 



It is easy to show that 



1 v 971 cec\{o} /eOT(F") 



Awt( s - S ')(C) 



From ([T]) it follows that U S)S i < ng~( n_fe ) for s 7^ s', so 

$(s) < ng- (n - fc) (|5| - 1) 

If |5| < g n "V(2n), we have 



if s = s', 
otherwise. 



(16) 



$(0) < - 

v ; 2 



and 



Hs)<nq-^- k \\S\-l) 2 . 

ses\{o} 



By a similar argument to Proposition 12.11 we conclude that there exists g 6 07t(F") such that 



$ 9 (0) = 



and 



^ $ s (s) <2ng- (n - fc) (|5|-l) 2 . 
se5\{o} 



If we choose S = $ 1 (0), then it is clear that conditions (l)-(3) hold. 
If 5 is invariant under any monomial map, then 

$( s ) = — - - V" $ idw „(f(s)) 
1 v qn /gsm(F™) 

Ss':wt(s')=wt(s) ( S ) 

(wtw)^" 1 )^ 



and hence, conditions (l)'-(3)' follow from (fl6|) with = $ id !;„(0). The proof is complete. 

-q 



Proof of Proposition HOI (1) The inequality can be rewritten as 

1 < (q - IT < q n - 
The first inequality is clearly true, and the second comes from q n = Y^h=o ~ ^Y- 

(q- l)(n-i) 



(2) The statement is proved by observing that 

\n(i + 1) - h g>n (i) = A(i) := log 9 



(i + 1) 



and 



A(i) I for i 



< 



. < 

> 



(g — l)n — 1 



(3) See Lemma [A. II and [TT1 Theorem 12.1.3]. 



□ 



□ 
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Proof of Proposition \2.6[ We only prove (4) The proofs of other statements are left to the reader. 



(4) We first prove the inequality in the case of wt(x + y) = or n. If wt(x + y) = 0, then x = — y, 



so that 

g h(x +y) = 1 < max{wt(x),n-wt(x)} gh(x) ^ /3(wt(x); wt(y))g h(x) + h( y)> 

n 

If wt(x + y) = n, then wt(x) + wt(y) > n, so that 



h(x+y) 



(g-l) n </3(wt(x),wt(y))g h W +h ^. 



Now we shall prove the inequality by induction on n. The case of n = 1 has already been covered by 
the above special cases. If n > 2, we can assume that 1 < wt(x + y) < n — 1. With no loss of generality, 
we assume that wt(a;i + yi) = and wt(x n + y n ) = 1, and we define 

X (^2; • • • j ^Cn)} (•^'1) • • • ) ^n— l)i 

y' = (y2, ■ ■ ■ , y n ), y" = {yi, y n -i)- 

Supposing the inequality is true for n — 1, we have 
= g Mx'+y') + (g _ 1)g h(x» +y ») 

< ? h(x')+h(y') + / _ 1 ^ q h(^")+h(y") 

n-l \f n — 1 \(q— ;nwt(x)+wt(y)-wt(:ci)-wt(2/i) 

wt(x) - wt(xi)y \wt(y) - wt(yi)/ 

j r ( q _\\( n-l \f n-l \( q - i)wt(x)+wt(y)-wt(a;„)-wt(i/„) 
\wt(x) - vrt(x n )J \wt(y) - wt(y n )/ 



< max 



(w- wt(x))(ra- wt(y)) wt(x)wt(y) | r>h(x)+h(y) 
n 2 ' n 2 (g — l) 2 



+ (g - 1) ma J(^-wt(x))wt(y) ; wt(x)(n-wt(y)) wt(x)wt(y) 1 ?h(x)+h(y) 

[ n 2 (q — 1) n 2 (q — 1) rr(g — l) 2 J 
< /3(wt (x),wt (y))g h(x)+h{y) , 

as desired. □ 
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